Two Embedding Theorems for Data with Equivalences under Finite Group 

Action 



Fabian Lim* 

Research Laboratory of Electronics, MIT, Cambridge, MA 02139, USA 

flim@mit.edu 



Abstract 

There is recent interest in compressing data sets for non-sequential settings, where lack of 
obvious orderings on their data space, require notions of data equivalences to be considered. 
For example, Varshney & Goyal (DCC, 2006) considered multiset equivalences, while Choi 
& Szpankowski (IEEE Trans. IT, 2012) considered isomorphic equivalences in graphs. Here 
equivalences are considered under a relatively broad framework - finite-dimensional, non- 
sequential data spaces with equivalences under group action, for which analogues of two well- 
studied embedding theorems are derived: the Whitney embedding theorem and the Johnson- 
Lindcnstrauss lemma. Only the canonical data points need to be carefully embedded, each such 
point representing a set of data points equivalent under group action. Two-step embeddings 
are considered. First, a group invariant is applied to account for equivalences, and then 
secondly, a linear embedding takes it down to low-dimensions. Our results require hypotheses on 
discriminability of the applied invariant, such notions related to seperating invariants (Dufresne, 
2008), and completeness in pattern recognition (Kakarala, 1992). 

Our first theorem shows that almost all such two-step embeddings can one-to-one embed the 
canonical part of a bounded, discriminable set of data points, if embedding dimension exceeds 2k 
whereby k is the box-counting dimension of the set closure of canonical data points. Our second 
theorem shows for k equal to the number of canonical points of a finite data set, a randomly 
sampled two-step embedding, preserves isometrics (of the canonical part) up to factors lie 
with probability at least 1 — j3, if the embedding dimension exceeds (21ogfc + log(l//3))/a(e, S) 
for some function a, and S is a positive constant capturing a certain discriminability property 
of the invariant. In both theorems, the value k is tied only to the canonical part, which may 
be significantly smaller than the ambient data dimension, up to a factor equal to the size the 
group. 
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Figure 1: In (a), an exercise illustrating equivalences between three types of "non-conventional" 
data (for answers see below). In (6), accounting for data equivalences while performing embeddings. 



1 Introduction 

A discrete finite sequence is arguably the most generic mathematical representation for finite- 
dimensional data. However, of recent interest are data sets where it is unclear how to appropriately 
assign sequence orderings to the data space. For example, ranking data lives on a space of index 
subsets, which has no meaningful ordering |13| . Graphical data lives on a space of graph edges, 
and node labellings may be often irrelevant [9[|18| . Quotient spaces that describe matrix manifolds, 
e.g., the Grassman manifold, have equivalence classes as elements [I]. 

We refer to such data sets as non-sequential, emphasizing the lack of ordering on their data 
space. For such sets, data compression becomes challenging. This is because we need to identify 
which seemingly different data points actually convey the same information. This is illustrated in 
Figure [T|a), whereby in each row, two (and only two) pictures are essentially the same (equivalent) 
but portrayed to appear different. Can you tell which two? The first row is designed to be an easy 
example, however the second row requires more time, and the third row is probably too difficult by 
human eye. These examples are not arbitrary, in fact they correspond to three previously studied 
"non-conventional" data models - the choice model [l5j, the Ehrenfest diffusion model (see |25| , p. 
5), and the graphical model (see (9 18 19]). 



In this paper we extend low-dimensional linear embedding techniques [2j[3j[5j[6 27 , to the above 
mentioned non-sequential data models - more specifically, to finite-dimensional spaces where data 
equivalences result from a finite group action. We consider a two-step embedding process, illustrated 
in Figure ^b). In the first step, we utilize a special function which produces the same output if two 
data sets are equivalent (under this group action); such a function, termed an invariant, accounts 
for data equivalence. Note however that the converse may not always hold, i.e., two data sets 
producing the same output may not always be equivalent, such converses are related to separating 



invariants [14] , and completeness in pattern recognition 17-19. In the second step, a linear 
embedding is applied on the output of step one, to move the data to the low-dimensional space. 
The interest here is to obtain embedding guarantees, to support the use of such techniques as a 
kind of compression scheme. This has to be done with hypotheses on the discriminative power of 
the applied invariant, as an appropriate one-to-one embedding is not possible if the converse does 
not hold for any two data points of interest. 
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Main results: We extend two embedding theorems to finite-dimensional, non-sequential data 
spaces R[<-f], discussed here for the case where the group Q acts by 'permutation action. Let 1Z 
denote a subset of that contains canonical data points in R[X], canonical under equivalence 

by action of Q. Then for a bounded set V of data points (possibly infinite), assuming that the subset 
V-jz of canonical points (V "projected" onto 1Z), are discriminable by the invariant (i.e., satisfies 



the converse property), our extension (Theorem 3.1) of the Whitney's embedding theorem shows 



that almost all such two-step embeddings can one-to-one embed V-ji, if the embedding dimension 
exceeds 2k whereby k is the box-counting dimension of canonical points in set closure of V-ji- For 



a finite set V of data points, our extension (Theorem 3.2) of the Johnson-Lindenstrauss lemma 



shows that a randomly sampled linear embedding, preserves isometries up to factors 1 ± e with 
probability at least 1 — (3, if the embedding dimension exceeds (2 log k + log(l//3))/a(e, 5) for some 
function a, and 5 is a positive constant that upper limits a to-be-defined undiscriminable fraction, 
between any two canonical points in V-ji ■ The value k measuring the size of the set V-ji of canonical 
points, may be much smaller than that of the whole set V, up to a factor j^Q in group size. All 
proofs are simple and require little knowledge of invariant theory, facilitated by making obvious 
linear properties of invariants over a tensor space. 

Significance of this work: These techniques are suited for database compression of non- 
sequential data, e.g., DNA fragments, chemical molecular compositions, web-graph connections, 
record of intervallic events, etc. Here the models to admit any type of finite group (permutation) 
action - more general than specific cases considered in [9 26 . Extensions to any matrix group action 
seems feasible - to be pursued in future work. A synergistic relationship is developed between linear 
embeddings and (data) invariants, whereby this work can be viewed as an adaptation of invariants 
for low-dimensional data in high-dimensional ambient spaces. Provable guarantees are provided 
on the required storage complexity (embedding dimension), tied directly to the size of the data 
set. The invariant used in the second embedding step does not determine this complexity; it only 
needs to satisfy the discrimability hypothesis. While probabilistic data models are typically used 
in past related works [9,21,26 , they are not required here. We discuss invariants with polynomial- 
time computational complexity, being at most mn u where m is embedding dimension, n is data- 
dimension of the model used, and oj > 1. Compare with representation theoretic transform- type 



invariants (see 17-19 ), where these methods require complexity of at least 0((#G) 2 ) to execute 
the fast transforms, a potentially large number if the group size j^Q is huge (^Q may even be 



super-exponential in n for permutation groups, see 18 , ch. 3 & 7). 

More discussion on related prior work 
some years now, in pattern recognition [17] , probability theory [13 

optimization [Tl|7], choice models [i~5], etc. Our interest in linear embeddings is due to the wealth 



Non-sequential data sets have been of interest for 
probability theory 



machine learning [13 16 18 



of recent interest on this topic, e.g., compressed sensing [6]. For invariant functions, the key area is 
invariant theor y [12[p~4 , though there exists other guises, e.g., convex graphical invariants [7j, triple- 



correlation 17,18 , see also survey article 28 . One of their main applications of invariant theory is 
classification, and characterization of discriminative ability is of recent focus, see Dufresne's Ph.D 
thesis 14 . For finite groups, a key result is that the set of all canonical points is in bijection with 
an affine algebraic variety corresponding to the ideal of relations, see 12 , pp. 345-353; however 
the best known complexity bound is super-exponential in the number of data-dimensions n. For 
triple-correlation and equivalences under compact groups, Kakarala in his Ph.D thesis characterized 
the discriminative ability under certain conditions 17 . Kakarala uses representation theoretic 



techniques known as Tannaka-Krein duality. The difficulty in obtaining computationally efficient 
invariants with absolute discriminative ability, is appreciated by observing that even for the specific 
class of graphical invariants, a polynomial-time algorithm for graph isomorphism is still unknown 
for general graphs. 



The work [26] is mainly an information theoretic study, for an efficient algorithm specialized 
for multisets see [2l| . In [9] a very efficient 0(£ 2 ) algorithm specialized for compressing £-node 
graphs is given, though their algorithm cannot be used as a graphical invariant. In both |:9|j26j, the 
dimension required for appropriate compression, is similar to that of our Johnson-Lindenstrauss 
lemma (Theorem 3.2) - there will be savings logarithmic log(#<5) in group size. For representation 
theoretic methods, partial labellings of graphical data is considered in [19] . 



For triple-correlations, Kakarala's proof in 17 is non-constructive, so an algorithm to invert 



an invariant function does not exist in general. However, invariant theory shows that the set of 
canonical points have a manifold, or algebraic variety, structure. Thus a possible future direction 
- inspired by compressed sensing - is to consider manifold optimization techniques (e.g., [l]) to 
perform inversion. In pattern recognition, correlation-type invariants are usually treated disparately 
from invariant theory, however they are related to polynomial functions from an invariant ring. 
However, do note that correlation invariants restrict to only transitive permutation group actions 
(where we say the data space is homogeneous). Also as Kondor pointed out [18| , pp. 89-90, one 
needs to take care of Kakarala's notion of homogeneous space^] 

Organization: Section [2] touches on preliminaries, developing the type of invariants used in 
this work. Section [3] states the main results, on Whitney embedding (Subsection 3.2) and Johnson- 
Lindenstrauss (Subsection |3.3[ ). Technical proofs are provided in Section [4] 

Supplementary Material (SM-I &: SM-II): For the sake of most readers who will not be 
familiar with both invariant theory, and representation theoretic analyses of correlation functions, 
two sets of supplementary materials are provided at the very end of this manuscript. Results from 
both these topics, alluded to throughout this text, are summarized in these materials. 



2 Preliminaries 

2.1 Finite-dimensional data (/-spaces: We assume some basic familiarity with group theory. 
Let Q denote a group, where h and g denote group elements. Let X denote a set of a finite number 
n of elements, and x denotes an element of X. Define a permutation action of group Q on the set 
X, where g(x) is the image of x under g, i.e., g(x) G X. This is a left action, i.e., for h, g G Q we 
have (hg)(x) = h(g(x)). A set X endowed with such an action of Q is called a (/-space. 

Let IR denote the set of real numbers. Let R[X] denote a set of real-valued re-dimensional vectors, 
indexed over the set X. Data points lie in this set. For a G R[X], the element of a indexed by x is 
written as a x for all x G X. The space R[X] (and therefore also the data) inherit the group action. 
If a 9 denotes the image of a under g, i.e., a 9 G R[X], then we have (a s ) 9 ( x ) = a x for any x G X. By 
the left action of Q on X given above, it follows that a^ = (a? ) h . While R[X] can be identified with 
R n , the notation R[X] emphasizes the group action. We illustrate using the following examples. 
Let e denote the group identity element of Q, and let j^X be the cardinality of X . 

Example. [Periodic data]: Let X = {1,2, ••• ,n}. Let Q denote the n-th order cyclic group, 
i.e., Q = {e,g,g 2 , • • • ,g n-1 }, whereby Q acts on X as follows: for the special element g, we have 
g(i) = i + 1 for 1 < i < n, and g(n) = 1. This action is transitive. 

Example. [Choice & graphical data]: Let X be the set of size-u; subsets of {1, 2, • • • , I}, where 
the size jfX = ( ). Let Sym^ be the symmetric group (or the group of all permutations) on 
i letters. Consider the group action of Sym^ on X , where for any g G Sym^, we have the image 
g(V) = {g{i) ■ i G V} for any V £ X. This action is transitive. The special case ui = 2 corresponds 
to graphical data, as any graph is defined by the specification of (2) edges. 



1 Kakarala's formulation of homogeneous spaces is different than that of Kondor (see Supplementary Material |SM-H.l 
Kondor points out that Kakarala's definition, in some cases, "do not model real-world problems as well" . We tend to agree. 
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More generally, one would let Q act on R n as a matrix group - as in invariant theory [12 14 . For 
simplicity, we focus only on permutation groups, which in fact covers all data models that apply 
for triple-correlation invariants (I7|4l9|. 



2.2 ^-invariants with certain linearity properties: We provide bare minimal background 
on invariant theory. Those familiar with this material may find our presentation unconventional, as 
the material is discussed in the way that we feel best supports the exposition of our main results. 

We build a tensor space using the vector space R[X]. For uj > 1, let X XuJ denote the product 
set X x • • • x X between u copies of X. Then an w-array, denoted [6 x (i :w )], has components 
6 x (i :w ) indexed over X Xu) , i.e., x (1:w) G X Xu) , where x (1:aj) denotes the w-tuple (x (1) ,--- ,x (< ^). Let 
R[Vf xaj ] denote the set of all w-arrays over X XuJ . 

The tensor (outer) product between two elements a, a' in R[X], denoted a ® a', equals 
(a x ■ a! y ) Xj y£x- Multiple tensor products, denoted a^ 1 ) ® ■■■ ® a^ for aP' G R[X], 1 < j < uj, 
follow similarly. Now aS 1 ' (g) • • • (g> a^ G R[^ xw ], y considering the w-array [a x (i) • • • o x (^)]- In fact 
R[Vf xaj ] is isomorphic to the space obtained by taking tensor products (between vector spaces) of uj 
copies of R[X], see 11 . For this reason R[Af Xa; ] is called a tensor space, where the dimensiorj^] of 



R[X] equals n u . For any a G R{X] , we denote a® 1 ^ to mean a (g) • • • <g> a with uj copies of a. 

We now explain how the tensor space R[Af xaj ] admits invariants. Firstly, X XUJ inherits the 
group action of Q on X, where the image g(x.( 1:ujS> ) of x( 1:aj ) under g equals (g(x^), ■ ■ ■ ,g(x^)). 
This obtains an action of Q on R[Vf XtJ ], where for any |[6 x (i: W )]] G R[A' XW ], the image <?([[fr x (i:^)]]) 
under g equals the u;-array [£> a -i( x (i:w))] (meaning that its the x( 1:< ^) -th component of the image 
equals & s -i( x (i :W ))). The previous action of Q on X xu induces an equivalence relation on X Xu) , 

whereby x^ 1 , x^ 1 G X xw are equivalent if there exists some g in Q that sends g(y^ ) = Xg 1 , 
The equivalence classes here are called ^-orbits (on X Xul ), denoted Vtg{X xu ). Each 



sec 
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<5-orbit £lg{X XU} ) will be associated with a w-array [6 x (i:u)], as follows 



fl ifx<-) 6!l6 (^x»), 
otherwise . 



Finally thinking of R[A ,X ' J ] as R( n "\ define an inner product as 



([a x (W)J, [& x (i:u,)J) = ^2 «x( 1: -) • ^xf 1 -)' ( 2 - 2 ) 

and we can construct a (/-invariant, a function whose output is invariant under action of Q. 

Proposition 2.1. Let Q he a finite group, with permutation action on data space X. For some 
Q-orbit £lg(X XuJ ) on X xu , where uj > 1, let fQ g fx xu ) '■ R[X XU) ] —> R denote the mapping 

fa g (X*«>) ■ IK(i-)J ^ ([Ox(i«»)I> I & xd-)1) ( 2 - 3 ) 



where [6 x (i :w )] is associated with £lg{X XuJ ) as in (2.1). Then fa g {x^) ^ s a Q-invariant, i.e., for 
any [a xC i :w )] G R[X XuJ ] we have fng(x^)(g(la^)j)) = /no(#*")([a*(i™>l) for all g G Q. 

Proof. For brevity, write f2 = Q(X Xlv ). Let g G Q. By the earlier definition of the image of [a x (i :w )J 
under g, the value /n(<7([[a x (i:w)]])) is computed by summing the coefficients a x (i-.u) supported over 
a subset V, of the form V = {^^(x^ 1 ^)) : x( 1:w ) G ^2}. Since is a <5-orbit, we may verify that 
V is a (gQg~ 1 )-orbit of X Xu) , here gQg~ l is a group, gGg~ l = {go~g~ l : a G £/}. But gGg~ l is an 
automorphism of the group £?, hence V = and we conclude the result. □ 



2 If ei , ea, • • ■ , e n is a basis of R[A?], then the n u tensors e CT n) ® e CT ( 2 ) ® * • • ® e <r(u) i f° r a U c 6 Sym^, consists a basis for 
the tensor space R[X* U ], see [ll]. 
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It is important to note that the (/-invariant (2.3) is linear in its domain X XuJ . We extend these 



invariants to obtain the following linear (/-invariant T u : R[A' XW ] — > R K " of main interest, by setting 

Tw ■ [o x (i: W )] i->- (zi, Z2, ■ ■ ■ , z Ku ), (2.4) 

zi = #(fig,i)~* • /n gii (K(i: W )]), 
where k w denotes the number of different (/-orbits on X XuJ , numbered as Og^,--- , flg,/^, and 
oj > 1. We propose to use (2.4) in the first embedding step (recall illustration Figure 



Algorithm 2.1. (/-invariant \2A\ and embedding step one 

1) for given data point a £ R[X], take the w-th tensor power a 5 

2) output the length- vector J 7 w (a® aJ ). 

In the upcoming Section[3j the linearity of J- w will be exploited to connect with linear embedding 



theory. The normalization factor #(£lg t i)~z w.r.t. orbit cardinality in (2.4) is so that F w will have 
unity operator norm (to ensure stability). 

But before going on to discussing embeddings, we clarify some properties of the invariants. 
Firstly, J-^ has polynomial complexity of evaluation (in n for fixed ui), exactly n w . Next, the 
number of (/-orbits over X Xul determines the (dimension of the) range of J- u , and we call K u the 
invariant dimension. We briefly discuss how to determine k u . Let 9g t x '■ Q — ► K 3 that satisfies 

9g,x(g) = #{x e X ; g(x) = x} (2.5) 

for all g £ (/, i.e., the value 9g,x(g) equals the number of points in X fixed by the permutation g 
in Q. The classical Burnside lemma, see e.g. [25| , p. 106, allows us to determine k u as follows 

^ = ^(M#. (2-6) 
Note Og,x(e) = #X = n for the identity element e. 

Example. [Periodic data]: If Q equals the cyclic group on n letters, i.e., then 9g x{g) = for all 
g / e. Since j^Q = #X = n, thus = n^~ x . 



To simplify calculation of (2.5), one may use the fact that for any g £ Q, 9g t x(o-go ) = 9g : x{g) 
for all a £ Q, see the following example. There exists an equivalence relation on elements in Q, if 
we deem h equivalent with g if h = aga~ l for some a £ Q, see [25] , p. 81. 

Example. [Graphical data]: For Q = Sym^ with some integer £, by the above relation there exists 
a bijection between equivalence classes, and the unordered partitions of £, see [25], ch. 10. For 
example, we can express £ = 3 as 1 + 1 + 1, 2 + 1, and 3; in the first partition three l's appear, 
in the second partition one 1 appears and one 2 appears. One can use this bijection to show that 
9g,x(g) = {# of 2's appearing} + ( {# of 1 s appearing}^ for thg partition corresponding to g. 

Remark 2.1. In invariant theoretic terms, the Q -invariant is equivalent to a generating set 



of the degree-LO homogeneous polynomials in the invariant ring, see supplementary material SM- 



I.l Due to interest in applying invariants for classification, there is recent focus on studying 



minimal sets of invariants that discriminate between all data points, i.e., any ai,a2 £ R[X] 



are never mistaken if ai ^ a.2 9 for all g £ Q, see YlM (Theorem SM-I.l). Unfortunately 
such powerful discriminability properties come at super- exponential complexity (Fact SM-I.l). 
Thus, it is meaningful to ask, for a given invariant J 7 ^, what are the pairs of data points that 
it cannot discriminate. For T^, this amounts to looking at an affine algebraic variety, see 



supplementary material SM-I.2, In particular for Q -spaces with transitive action, we can view 
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jF u as a multi- correlation function (see SM-II.l), and relate to completeness results for the triple- 



correlation UJttW (see SM-II.2) 



3 Two Theorems on Low-Dimensional Linear Embeddings of Data-Invariants 
3.1 Two-step linear embedding (Figure [TJ&)): For some oj > 1, first apply a (/-invariant 



in Algorithm 2.1 to place the data (some a G IR[Vf]) in dimensions. Next, use a linear map 
$ : R Kuj —7- R m to effect the dimension reduction, whereby m < min(K a; , n). Specifically, compute 

$0M a H)> ( 3 - 7 ) 

where for convenience ^J-^ will stand for the concatenation of the map J- u followed by the map <F 
Clearly QJ- U is a (/-invariant, linear in the domain IR[^f xaj ], and drops dimensions down to m. 

We desire embeddings that map the data set, some V C R[X], onto the lower dimensional space 
in some injective manner. This is possibly only when the embedding dimension m is sufficiently 
large enough to accommodate the data set. The key here is that m can be much smaller than the 
ambient data dimension n, where m should really only be tied to the size of V. Linear embeddings 



have been studied for when V is a union of subspaces [5j,[6 20 , and a smooth manifold [lj|4 10 
Here we look at the case where V comes from a finite-dimensional, non-sequential (/-spaces for 
finite groups Q. We derive analogues of two well-known embedding theorems, in this two-step 



setting that employs (/-invariants, for both the Whitney embedding theorem (Subsection 3.2) and 



the Johnson-Lindenstrauss lemma (Subsection 3.3) 



3.2 How many dimensions are needed to embed non-sequential data? In Whitney 
embedding we consider V to be a bounded subset of R\X\. The size of a bounded subset V, will be 
measured by the box-counting dimension. For a bounded subset V, we define: i) the closure V, and 
ii) the minimal number N e (V) of boxes with sides of length e (in IR[<"t]) required to cover V, in a 
grid. The box-counting dimension is then defined as 

boxdim(V) = lim lQg ^ (V) (3.8) 
ho -loge 

if the limit exists. Roughly speaking, if boxdim(V) = d, then N € (V) ~ e~ d . The lower box- 
counting dimension, denoted boxdim (V), is defined regardless by replacing the limit by lim inf. 



From our two-step embedding (3.7), the map QJ-^ cannot produce a one-to-one embedding for 
V, since the linear tensor invariant J- u is not always one-to-one on w-th tensor powers of V. On 
the other hand, we do not care to discriminate between equivalent data points. Thus to state what 
is an appropriate or desirable embedding, we first define a canonical notion of elements in R[X], 
of which we only discriminate between. To this end, define the following disjoint subsets of R[X]. 
For a G R[X], we say a is un-fixable if a 9 ^ a is satisfied for all g G Q. Let 1Z denote an open 
set in R\X\. Let 1Z satisfy the following 3 properties: i) all elements of 1Z are un-fixable, ii) the 
jf-Q subsets {a 9 : a G 1Z}, one for each g G Q, are disjoint, and iii) the union U 9 gg{a 9 : a G 1Z} 
contains all un-fixable elements in R[X]. There are exactly j^Q disjoint^] open sets in R\X\ that 
satisfy the above properties. We call these open sets fundamental regions, and any one of them 
will give us our required canonical notion. For V C R[X), a set of canonical elements can be 
{a 9 G 1Z : a G V, g G G}, which we denote by V-ji for brevity. Our hypothesis on discriminability is 
now stated formally: a (/-invariant is said to be discriminable over a subset V, if this function is 
one-to-one over Vn where 7Z is any fundamental region (note that this definition does not depend 
on the choice of 1Z). 



The following theorem is a analogue of Theorem 2.2. [24], for two-step linear embeddings (3.7) 



i Since if 1Z satisfies these conditions, then {a 9 : a £ 7?-} for any g £ Q also satisfies. 
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over finite dimensional (/-spaces. 



Theorem 3.1. Let Q be a finite group. Let X be a finite dimensional Q -space. For some u > 1, 



let T w be the Q -invariant in (2.4). Let 1Z be any fundamental region. 

Let V denote the data set, V C R[<-f], and assume V is bounded. Assume J- u is discriminate 
over V, and let k = boxdim(V-fc), where we assume this limit k exists. 

Let $ be a linear map, that drops dimension from to m. Then if m > 2k, then almost all 
such linear maps, the concatenated map will be discriminate over V. 



The two-step linear embedding (3.7) with embedding dimension twice that of the data set, 
is guaranteed to appropriately embed a data set V as long as the linear tensor (/-invariant T w is 
discriminable over V. 



We make three comments on Theorem 3.1, starting with storage complexity. In its original 
version [24] for sequence data spaces, the value k, is taken as the box-counting dimension of the 
(closure of the) whole data set V. Here however, k = boxdim(V-fc), which may be as much as 
j^Q times lower than boxdim(V). This "factor of f^Q savings" is intuitively expected, as we only 
need to differentiate between canonical elements in 1Z. Similar savings have been reported in other 
works (9 26 . 



Secondly the computational complexity of evaluating QJ-^ is exactly mn^, polynomial in data 
dimension n (for fixed m,uj). Each coordinate of is obtained by a weighted average of linear 
functions fu gi ( X xw), 1 < i < k u . 

Thirdly the linearity of <&J- W may be exploited to reduce computation. For example in [19] , 
Kondor et. al. used a subspace of RfSyrmJ to represent]^] graphical data on i nodes, a (Sym^)-space 
where n = £1, see [l8j. Now if the data lives in a /c-dimensional subspace V of R[X], k < n, let 
A : R k — > V be a linear map onto V. Then the tensor product map A® w : R k " — > y® w , where 
y®w ^ R[Af Xa; ], is linear in its domain R fc ". Now the concatenated map from R fc " to R m will be 
<$>J-uA® u , where each coordinate is obtained by a map obtained from a weighted average of functions 
fn g i(A ,xt ")^® W ) 1 ^ j ^ k lui an d this map is linear (and can be evaluated in k w operations. Hence, 
the total evaluation complexity of to R m equals mk^, where again k is the data dimension. In 
the above example where X = Sym^, we have k = („), so the complexity equals C(m£ 2tJ ), which 
(for fixed m, u) is polynomial in the number of nodes i. 

3.3 How many dimensions are needed to preserve isometries of non-sequential data? 



Theorem 3.1 does not provide any notion of distance isometries under embedding, important for 
certain "sketching" -type applications. An important result for isometry preservation is the Johnson- 
Lindenstrauss lemma. In this part, the data set V will be assumed to contain a finite number of 
discrete points in R[X]. Also here, we state the discriminabilty hypothesis slightly differently. By 
2-norm || • H2 on elements in IR[^' ><aj ], we mean the norm 



IPx(-)flll2= / E 6 x(-r ( 3 - 9 ) 
y x( 1: ")e*x" 

as if we were treating R[A' XW ] as R^" - "). Assuming that is discriminable over V, there must exist 
some constant 5 < 1, such that if for any ai, a2 £ V-ji, where 1Z is any fundamental region, we have 

||4F>r - af)||! < <5 • I!*?" " af (3.10) 
where Ajr^ : R[X Xul ] —> R[X XuJ ] is the orthogonal projection onto the kernel of J- w . That is for 
canonical elements ai,a2 E V-ji, the constant 5 captures the maximal fraction of "energy" of the 

4 Kondor et. al. represented each data corresponding to edge in a redundant fashion using multiple coefficients a x of 

a £ R[SymJ, for all x that send {t - 1,£} to 

7 



error a 



in the kernel of T u , 



The following theorem is a analogue of (the most basic form of the) the Johnson-Lindenstrauss 



lemma, for two-step linear embeddings (3.7) over finite (/-spaces. The result is stated for the 



case where the coefficients of $ are sampled from the normal distribution. However as in many 



works 1 2 , 3 , 6, 22 , extensions to more general distributions should not be too difficult. 



Theorem 3.2. We take X ,Q,R[X] and 1Z as defined in Theorem 3.1 Let V contain a finite number 
of discrete points in R[X]. Let k = ifV-R. For some u) > 1, assume J- u is discriminable over V , and 



that the constant 5 < 1 satisfies (3.10). Assume that the size mx k w linear map <3?, has coefficients 



independently sampled from a normal distribution with variance 1/m. Then with probability at least 
1 — (5, if the embedding dimension m of the map $ exceeds 

21ogfc + log(l//3) 



where a(y) 
isometries 



y 



y 3 for any y £ 



a((e-5)/(l-6)) (JU1) 
we will have for any ai,a2 G V, ai ^ 8i2, the following 



< 
> 



|bf w 
lb? 



v _:i-6)- l|U1 

for any positive e > 5, and canonical elements bi,b2 (where h\ 
91,92 G Q such that bi,bi € Viz)- 



i_.®u>||2 
D 2 Il2> 



(3.12) 

ai ffl and i>2 = &2 92 for some 



u 2 Il2> 



The factor e in ( |3.12 ) should not be too close to the constant 5 in (3.10) - this increases the 
required value for m (it affects the denominator of (3.11)). Also again thanks to (/-invariance, the 
(potential) "factor of #<7" savings appear in k (here k = #Vn not k = #V). Do note there is a 
difference how these savings impact the embedding dimension m; unlike the previous Theorem |3.1 
where the factor of jf^Q impacts m multiplicatively (seen from the required assumption m > 2k), 



in Theorem 3.2 this factor impacts m logarithmically (seen from (3.11)). Also as seen form (3.12), 
the isometries are measured in the tensor space IR[ < ; f xa; ] (not in the data space R[<-f]). If one desires 
isometries in the original space, one requires some equivalence between the 2-norms of both spaces 
R[X] and R[X Xul ], not addressed here. 



The next section provides technical proofs for the Theorems 3.1 and 3.2 



4 Technical Proofs 
4.1 Proof of Theorem 3.1 



The proof follows relatively closely with |24| , though the consider- 
ation of (/-invariants allow certain simplifications, also see [5]. 

First some new notation. For any a G R n , for some positive integer n, we denote i3 n (a, e) to 
be the n-dimensional ball of radius e, centered at a. For any map, sometimes denoted A here, for 
any set V that lies in the range of A, we shall use A~ 1 (V) to denote the pre-image of V. For any 
V C R n for any n, we denote the volume of V as vol(V). We will need the following two lemmas, 
simplified from |24| . For convenience, the lemma proofs are reproduced in Appendix |A| 

Lemma 4.1. (c./. Lemma 4.2, |24| ) For some positive integers r,m, m < r, let A be some 
surjective linear map from R r to R m . Let a > be a smallest singular value of A, obtained 
from any matrix form for A. Then for any e > 

vol(A- 1 (i3 m (e))nS r ((5)) 



< 



2-/2. 



e 

V5 



(4.13) 



vo\{B r {5)) 

where B r {e) and B m (e) are respectively r- and m- dimensional balls centered at the origin. 
Lemma 4.2. (c./. Lemma 4.3, (24)) Let V be a bounded subset ofR n , with k = boxdim(V), and 
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we assume this limit k exists. Let pi,-- - ,p r be r number of Lipschitz maps from R n to R m . Further 
assume that for each a £ V , the linear map A : R r — > R m described by the matrix [pi (a), • • • , p r (a)], 
is surjective. 

For each £ R r with bounded 2-norm, = [Pi,-- - ,Pr], define pp = Yll=iPiPi- Then for 
almost every such bounded the preimage /9^ 1 (0) of the map pp w.r.t. the single point 0, has 

lower box-counting dimension at most k — m. If k > m, then Pp 1 ^) is empty for almost every 
Proof. 



[Proof of Theorem 3.1 



We begin by making a connection with Lemma 4.2 first specifying for some positive integers 
n2,r, the Lipschitz maps p\, ■ ■ ■ , p r (where each p$ : R™ 2 — > R m ), and vectors /? in R r . Note, here 



n2 replaces n in Lemma 4.2 



The domain R™ 2 , where 712 



n" 



is identified with R[^ Xw ], and we set the maps p% : R[X y 



as 



Pi+m(j-l) '■ 



K(i-)1 *-> #( n Gj) ~ 2 ■ fog j (la 



c(l-")J 



e; 



using the 1-Lipschitz functions fu gj appearing in \2A\ , for all 1 < i < m, 1 < j < re w , 



(4.14) 
and where 

ei, • • ■ , e m constitute any basis of R m . Thus here r = m/; u , and we associate each vector /3 in R mK ^ 
with the linear map $ : R Kuj — > R m , where is formed by column- wise stacking of the coefficients 
from the matrix representation of <£. Under these associations, it becomes clear that the map 
Pp ■ R[X y 



R m in the statement of Lemma 
{af w - a!" — 



4.2 



equals ^J 7 ^. 

■ &i,&2 £ V-jz, ai 7^ §2 }, i.e., is (homomorphic) to the set of non- 
with V^ 2 ^ replacing V, with 2k replacing k (since 



4.2 



Let V (2) - ->«, - 
equal pairs of Vn- We want to apply Lemma 
boxdim(V( 2 )) < 2k). If the lemma applies, this shows one-to-one mapping on Vn, which proves the 
theorem. To do so, we need to show that for each [o x (i: W )J £ V2, the linear map A : R mK " — > R m 



as described in the statement of Lemma 4.2, is surjective. This will follow from the hypothesis 
that Tw is discriminable over V, which implies that for each [a x (i:w)l £ V2, there exists some 
function /q q 1 < j < « w , such that fo g JJa x (i-.u)}) 7^ 0. By the association of A with the matrix 
[pi([a x ( 1:w) ]), • • ■ , PmK U (la x Q-.u)l)], &om ( |4.14[ ) we conclude that since fa gj ([a x (i :U )]) / for some 
j, the map A will indeed be surjective. Thus the result is proved. □ 

The key to the proof is the discriminabilty hypothesis. The important point is that does not 
impact embedding dimension to; here to is tied directly to data size (tied to k = boxdim(V7?,)). 
We also point out that while Sauer et. al. discuss more generalized versions of Lemmas |4.1| and 



4.2 that do not require surjectivity of A (see (24], Lemma 4.6), these generalizations are not useful 



here. This is because as our proof of Theorem |3.1| reveals, the map A is either surjective (in the 
case discriminabilty holds) or otherwise the zero-map (in the case discriminabilty does not hold). 



4.2 Proof of Theorem 3.2^ The proof here also follows with simple modifications, by 
appropriately incorporating discriminabilty notions. Standard concentration results, such as the 
following one, will be useful (for convenience, its proof is reproduced in Appendix [A| . 

Lemma 4.3. (c./., (2j[3j) Let A be an m x £ random matrix, whose matrix entries are standard 
normal RVs. Let the rows of A be independent. Then for any x £ R , for any e > we have 

Pr{| ||(lA/m)-Ax||!- ||x||| | < e} > 1 - 2e~^^- e3) (4.15) 

The proof of Theorem |3.2| given below will follow for other (row independent) distributions of 



A, if probabilisitic inequalities similar to (4.15) are available. Indeed they are for many other of 
distributions, see e.g., (2lf3 , 27 . We do not go further into detail since this component is not our 



main focus. We use Lemma 4.3 to prove our second main theorem. 
Proof. [Proof of Theorem 3.2 
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It suffices to show the result for pairs ai,a2 E V-jz, ai 7^ a2, of canonical elements, since the 



LHS of (3.12) remains constant when replacing ai,a2 with t>i,b2. For uniformly sampled (recall 
lemma statement) as A = the probability that 

/ <(l + e)-||JUaf--a$ 



>(1 



\J~ ^(af 3 



a 2 

k 2 



\l 
\l 



(4.16) 



holds for all (*) < k 2 /2 pairs whereby ai,a2 E Vtj, is at least 1 — k 2 ■ e * ( e e3 \ Here we used 
for each x = •^(af" - af"), x E R m . Comparing ((4161) with (|37l2|), the norm || • || 2 

not 1 



Lemma 



4.3 



on the RHS needs to be applied on the RLY XW ], not IR m . Recall from its definition, see (2.4), that 
J- u is 1-Lipschitz and linear in R[X XU1 ], so the upper bound follows as 



< 



1 — a 2 J\\2 ^ \\\ a l a 2 JW2- 

For the lower bound, we use the hypothesis J- w is <5-discriminable over V, where for the orthogonal 
: R[X Xu] ) R[X XUJ ] onto the kernel of J 7 ^, see ( |3.10[ ), we have 

> ||J r u (af w -af )||i + ||4ji,(af -af )||| 

(4.17) 



projection Aj? t 
1 1-^" ^(af 



II, 



equality following because both F u and Aj^ project onto "orthorgonal"^] spaces, which implies 



\J~ ^(af 3 



>(!-«) 



Using this in (4.16) and rearranging (1 — e)(l — 6), this proves that (3.12) is satisfied with required 



probability, for constant e(l — 5) + 5 > e (the strict inequality follows since 5 > 0). The statement 



of the proposition will satisfy for some probability /3 > k 2 
used here. 



and rescaling the e term 

□ 



The linearity of the (/-invariant J- u is very useful for deriving the lower bound (4.17)), which 
admitted the use of orthonormality concepts. It is also useful for deriving the upper bound, since 
it made it easy to check that T w is 1-Lipschitz. We are now done with the proofs of both main 
results. 

Remark 4.1. For finite groups, there always exists an invariant satisfying the discriminability 



hypothesis [Lfil (albeit with super- exponential complexity, see Theorem SM-I.l and Fact SM-L1). 



However from an embedding complexity standpoint, for any non- sequential data set, (theoretically) 
one can always find a two-step embedding meeting the guarantees in both Theorems \3.1\ and \3.2\ 

Also, the canonical points in any fundamental region 1Z, have a manifold structure within an 
algebraic variety (see supplementary material \SM-I.^\) . Hence an interesting future direction is to 
connect with manifold learning techniques (e.g., 

5 Conclusion 

We present a new extension of linear embeddings for non-sequential data, providing two theorems 
in the vein of Whitney embedding and the Johnson-Lindenstrauss lemma. We show that accounting 
for data equivalences can provide savings in embedding dimension up to a factor equal to the size 
of the invariance group (the savings is logarithmic in the second theorem) . The extension was fairly 
simple, and we appeal to certain linearity properties of invariants. 

Acknowledgment 
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b Strictly speaking, orthornormally projects onto the (coefficient space) of the complement of its kernel. 
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A [Appendix] Proofs of Lemmas 4.1\ |4.2 and 4.3\ appearing in Section [4] 

Proof. [Proof of Lemma 4.1 The set A -1 (B m (e)) n B r (6) consists of points in R' r with 2-norm 
that get mapped to points in R m with 2-norm at most e. Since A is surjective 
> 0, this set of points is contained in a cylindrical subset of R r , 
with base dimension m, and base radius e/o~, see 24 . The volume of this cylindrical subset is 
at most (e/<7 
vol(B r (J)) = 6 

vol(B/(l)) = tt^ 2 /(£/2)\, we conclude Ogb. 



at most 5 
with smallest singular value a 



m d~ r m • vol(B m (l)) • vol(B r _ m (l)), recall we assumed m < r. On the other hand 
vol(B r (l)). Using these two facts and also the fact that the ^-dimensional volumne 

□ 



Proof. [Proof of Lemma 4.2 As we consider /3 with bounded 2-norm, it suffices to replace R r with 
B r (0, 5) for any 5 > 0, i.e., it suffices to restrict \\PW2 < 5, for some 5 specified in the sequel. 

For any bounded /?, by assumption pp is Lipschitz, thus there exists some constant C such that 
the image of any e-ball B n (e) under pp, is contained by in some (Ce)-ball in R n . For k* > 0, 
consider e~ k number of n-dimensional e-balls, denoted B n (aii,e), with various centers aj in V. If 
k* > k, we can find e~ fc such balls that cover the set V of interest. 

Now for each B n (sn, e) in the covering of V, the image of B n (aj, e) under pp contains 0, only if 
||p/?( a «)ll2 < Ce for the constants C and e above. For now, we make the following claim that for 
any a 6 R ra and some large enough choice for 5 

vol({/3 G B r {5) : |M a )ll2 < Ce}) < C x e m (A.l) 

where C\ is a positive constant. Then for any £ > 0, by a standard argument]^ the volume of 
/? where at least e~ e of the e~ k images of S n (aj,e) contain (under pp), is at most C\e m ~ k * +t . 
In other words, the preimage /^(O) can be covered by less than e~ e number of e-balls, with an 
exception of maps pp for which the volume of the corresponding /? can be made small if £ > k* — m 
and e is small. Thus we conclude when £ > k* — m and e goes to 0, we have bj5xdim(p^" 1 (0)) < £ 
for almost every /? in B r (0, 5). As this holds for all £ > k* — m, and that k* can be made arbitrarily 
close to k for sufficiently small e, see 15,241, we have boxdim fp^ (0)) < k — m. 

We finish the proof by showing the earlier claim (A.l). Associate Pp(sl) with a linear map 
A as described in the lemma statement, whereby we assumed that A is surjective. Hence, the 



positive constant a as given in the statement of Lemma 4.1 will exist. We then can apply (4.13) 



by observing that the volume on the LHS of (A.l), equals the volume vol(p l {B m {Ce)) H B r (5)) 

Thus for a large enough choice for 5 

□ 



similar to that the LHS of (4.13) (with e replaced by Ce 



i equals the 
2 , whereby 



(where C/(a5) < 1), we can find a constant C\ that satisfies (A.l). 

Proof. [Proof of Lemma 4.3 Express ||Ax||| = x T (A T A)~x = |(^4j,x)| 2 , where A 
i-th. row of matrix A. Call Zi = |(^4j,x)| 2 , and observe EZ\ = EZi = ||x 
without loss of generality we assume HxH 2 , = 1. We thus want to upper bound the probability 
Pr{| Ya=i — rn\ > me}. We will only consider one side Pr{^™ =1 Zi — m > me}, the other side 
Pr{^™ =1 (— Zj) + m > me} can be considered similarly. 

By assumption A has independent rows, the RV's Zi 
Markov's inequality, for any 9 > 



are mutually independent. Then by 



Pr \ Yl Zi ~ m > me f - e- md{t+1) ■ (Ee 9Zl 



(A.2) 

where we used the fact that Zj's are identically distributed. Using the fact that the entries of A 
are standard normal RV's, then Z\ is chi-squared and for 6 < 1/2, and its a standard result that 
Ee® Zl = (1 — 28)~ m /' 2 . Substituting this form for Ee 9Zl in (A.2), we optimize the upper bound over 



B For n events £\, ■ ■ ■ ,£ n , we have that the union bound ^22=1 P r {£»} equals ^22=1 P r { a t least i events Si}, see [23] , thus 
we conclude that the union bound is greater than j ■ Pr{at least j events Si\ for any j, 1 < j < n. 
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9, which requires 9 = e/(2 + 2e) < 1/2. It follows that the LHS probability of (A. 2) is at most 
[(1 + e)e~ e ] m ^ 2 , and what we wanted to show follows from the bound 1 + e < exp(e — (e 2 — e 3 ) /2). □ 
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SM-I [Supplementary Material] Background on Invariant theory 

SM-I.l The invariant ring always satisfies the discriminability hypothesis: We expect 
most readers to be unfamiliar with invariant theory. For their convenience, this first set of 
supplementary material briefly covers results/facts cited and alluded to in the main text. We 
begin with the connection of invariant theory to algebraic geometry - the study of polynomial 
functions/equations. We discuss the invariant ring, i.e., the ring of invariant polynomial functions. 



We clarify how the (/-invariant J- w in (2.4) actually relates to such functions, hence the kernel of 



J- w relates to algebraic varieties. We state a result on seperating invariants from Defrusne's thesis 



(Theorem SM-I.l), that for finite groups the invariant ring has absolute discriminative power. We 



state the results that how the set of canonical points has an manifold structure as an algebraic 



variety (Theorem SM-I.2). For a good reference text see Cox-Little-O'Shea |12) . 

We assume some basic ring theory. Denote R [Z\ , • • • , Z n ] to be the ring of n-variate polynomials 
over IR. For / G R[Zi, ■ ■ ■ , Z n ], let / denote an n-variate polynomial with real coefficients. We think 
of / as a polynomial function with domain R n , by letting f(a±, ■ ■ ■ ,a n ) be the evaluation of / at 
point (oi, • • • , a n ) G R n . By the identification of R[X] with R n , we also think of / as a function on 
R[X], for some (/-space X where #X = n. For some a G R[X], we write the evaluation as /(a). 
Going back to (2.3), we identify fctgix^) with polynomial functions in R[Z\, ■ ■ ■ , Z n ], as follows. 



There exists some / G R[Zi, • • • , Z n ], such that /n g (#xw)(a® £i ') = /(a) for any w-th tensor powers 
a 8 "^, i.e., if the domain R[A' XW ] of the former function is restricted to tensor powers, then the 
the former function is essentially a polynomial function. This polynomial / that corresponds to 
fn g (x XLj ) must be homogenous, i.e., all monomials of / must all be of degree oj. 

By the above association of (/-invariants ffi g (x XL - j ) an d polynomials /, such an / is a (/-invariant. 
We formalize the permutation action^] of Q on the polynomial ring R [Z\ , • • • , Z n ] . Allow Q to 
permute the variates Z%s by the identification between R n and R[X]. More specifically for any 
g G Q, if f 9 denotes the polynomial after permuting the variates of /, then for any evaluation 
under a G R[X] we have (/ s )(a) = /(a 9 ). Hence if the polynomial / is a (/-invariant, then / must 
satisfy f 9 = f for all g G Q. Invariant theory is the study of the set R[Z\, ■ ■ ■ , Z n ]^ of all (/-invariant 
polynomials, for some group Q. This set is called an invariant ring (of Q). Now with reference 
to the previously discussed polynomial ring R[Z\, ■ ■ ■ , Z n ], note that R[Z\, • • • , Z n ] s is a subring of 
R[Z\, • • • , Z n ], and that R[Z\, ■ ■ ■ , Z n ]^ contains the constant polynomials. Also R[Z\, ■ ■ ■ , Z n ]^ is 
said to be graded, whereby each grade refers to the set of all (/-invariant homogeneous polynomials 
of a certain degree u > 0, see |12|, p. 331. We refer to this set of degree-w homogeneous polynomials 
as the u-th component of R[Z\, ■ ■ ■ , Z n ]^ . Clearly, each oj-th component is closed under R-linear 
combinations. In fact, it is known that each such component can be generated by polynomials 
fir " i fn^ , each fi corresponding to the i-th orbit invariant /q s . , recall ( |2.4[ ) . It now becomes 
clear how the (/-invariant corresponds to the w-th component; each "row" of T w corresponds 
to a (polynomial) generator. The number k w of generators is computabld^ by the same equation 



(2.6) 



'For simplicity we still focus on permutation actions, though the invariant theoretic results discussed here holds for matrix 
groups in general. 

8 For matrix groups, we have a more general formula based on Molien's Theorem [12], p. 340. 
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At this point one realizes that Algorithm 2.1 in Subsection 2.2 proposes to only use one u 



th component. Evaluating T u only requires polynomial complexity {n w operations). But what 
about the discriminability hypothesis? In the next Supplementary Material |SM-II we explain the 



connection between each T w and the so-called multi- correlations (related to pattern recognition). 
In particular for the special case u = 3, Kakarala has applied representation theoretic methods to 
obtain so-called completeness results, or in other words a characterization of the discriminability 
hypothesis under certain conditions. On the other hand if one is willing to consider the entire 
invariant ring, the discriminability hypothesis is known to unconditionally satisfy for any subset in 
R[X]. We cite the following result in Dufresne's thesis, stated here slightly differentljj^] 

Theorem SM-I.l. (Corollary 3.2.12, (TIJ, P. 26) Let Q be a finite group. Let X be a finite 
Q-space. Then all u-th components of the corresponding invariant ring, for all oj < f^Q , will be 
discriminable over the whole data space R[X]. That is for any fundamental region 1Z, for any 
canonical points ai, a 2 & ^[X]n> a i> 7^ a 2> there exists some Q -invariant f in R[Zi, • • • , Z n ] s with 
degree at most #G, such that /(ai) ^ f{&2)- 

Recall each corresponds to the w-th component. Hence if all (/-invariants T^, for all oj < #Q, 
are appropriately made to form a single (/-invariant, then such a (/-invariant will be discriminable 
over any data set V. This leads to the following important observation. 

Fact SM-I.l. The discriminability hypothesis can always be satisfied with large enough 
computational complexity: There exists a single Q-invariant corresponding to oj- components, 
oj < j^Q , that for any data set V C K[^f], satisfies the discriminablity hypothesis in both our 



Whitney embedding Theorem \3.1\ and Johnson-Lindenstrauss Theorem \3.2 . 

This implies that any bounded, non- sequential data set V can be appropriately embedded with 
embedding dimension m tied only to its relevant size k. 

However, such an invariant requires 0(n*®) complexity to compute, exponential in the size of 
Q - clearly infeasible in practice for most group sizes. 



It is not yet known if the size requirements on oj in Theorem SM-I.l is necessary (in certain 
cases they can be improved). Now since the same theorem holds for all of R[X], one meaningful 
approach would be relax this requirement, and only consider specific subsets of R[X]. Kakarala 
adopts a similar strategy for triple-correlations, by obtaining completeness results under certain 
assumed data conditions (see second set of supplementary material). 

SM-I.2 The set of canonical points includes a manifold structure: Another beautiful 
aspect of invariant theory, is due to its connection with algebraic geometry. In particular, there is 
a remarkable explanation how the set of all canonical points has a manifold-like structure, in the 
form of an affine algebraic variety |12| , pp. 345-353. An (affine) algebraic variety is a set of points, 
whereby there exists a set of polynomial equations, for which is satisfied by every point in this set. 



For example, the kernel of the (/-invariant F u in (2.4) is related to the following algebraic variety 



{a e R[X] : /i(a) = 0, 1 < i < Kuj }, (SM-I.l) 

where /i(a) = /q .(^xuj)(a® w ). For the same polynomials fi, the following set is also an algebraic 
variety 

{(ai,a 2 ) G R[X] x R[X] : / f (a x ) - /i(a 2 ) = 0, 1 < i < ^}, (SM-I.2) 



whereby this second set (SM-I.2) contains pairs of points in R[X] that cannot be discriminated by 



the (/-invariant T^. In theory, the set could be computed by elimination theory using a Grobner 



9 The statement in |l4| uses a stronger notion of discriminability, called a geometric separating set, see Definition 3.2.1, p. 
15. Also it holds for general matrix groups. 
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basis, see [12], ch. 3, which will obtaining useful characterizations of such pairs of points (ai,a2). 
Though such an approach can be unwieldy for large n, it does suggest a a possible algebraic geometry 
view of characterizing discrimability of invariants, besides the representation theoretic techniques 
of Kakarala's. Also Kakarala's techniques currently only hold for triple-correlations (i.e., oj = 3), 
whereas here u) could be arbitrary. 

The algebraic variety structure of the set of canonical points is a little more complicated to 
explain, and requires the algebraic closure of R to the complex field C. Take a generating set 
of the invariant ring £,[Z\,--- ,Z n ]^ over C, say /].,••■ , ft for some £ > 1, and form a map 
p : C[X] — > C : a h-> (/i(a),--- ,/^(a)), where C[X] is the complexification of R[X]. Recall the 
notation CfAf]^, which means a set a canonical points in C[X] lying in some fundamental region 
7Z. There exists an invariant theoretic result that says that CfAf]^ is in bijection with the image 
of p, whereby this image is actually an algebraic variety. The set of polynomial equations that 
describe the image comes from the generators of a special ideal of the ring C \Y\ , ■ ■ ■ , Y(\ of £-variate 
polynomials, where i is the number of generators fi of the invariant ring. This ideal, known as the 
ideal of relations, contain all (3 in C[Y"i,--- , Y$\ whereby /3(/i,--- , fi) is identically zero; here 
■ ■ ■ j ft) is thought of as a polynomial in the variates Z^s. This result is stated as follows. 

Theorem SM-I.2. (Theorem 10, [12], P. 351) Let fi,--- , ft generate the invariant ring 
C[Zx, ■■■ , Z n f, for some I > 1. Let p : C[X] -> C* : a ^ (^(a), • • • , #(a)). 

Let Pi, ■ ■ ■ ,(5 r generate the ideal of relations in the ring C[Yi, • • • , Yg\, for some r > 1. Consider 
the algebraic variety 

{(h, ■ ■ ■ , b r ) G C r : Pifa, ■ ■ ■ , b r ) = 0, 1 < % < r} (SM-I.3) 
Then the image of p is surjective over the algebraic variety ( SM-I.3[ ). In fact if we restrict p over 



the domain C[X]n for any fundamental region 1Z, then p with this restriction of domain, becomes 
bijective. 



Theorem SM-I.2| remarkably shows how the set of canonical points, after passing through this 



map p, has the manifold structure of the algebraic variety (SM-I.3). This brings to mind the 
possibility of applying manifold learning techniques to learn the canonical points. However until 
one derives an analogue of Theorem |SM-I.2j for the reals, one needs to work in C. 

SM-II [Supplementary Material] Completeness results for triple-correlation 

SM-II.l Multi-correlations are connected with invariant theory: Auto- and triple- 
correlation functions have been employed as invariants in pattern recognition [17[{19], though the 



presentation has always been disparate from invariant theory. The first goal of this second set of 
supplementary material, is to provide unification. We begin by clarifying how a generalization of 
such functions (that we call multi- correlations) are one and the same to the graded components 



of the invariant ring (see previous Supplementary Material SM-I). Then next, for the sake of most 



readers not familiar with Kakarala's completeness results for the triple-correlation, we provide a 



primer in Subsection SM-II. 2). 



For correlation functions studied pattern recognition, the group action is limited to transitive 



permutation actions. Recall the two examples given in Subsection 2.1 For this special case, the Q- 
space X is referred to as a homogeneous space. To explain correlations, we require the following 
notion of Q itself clS EL homogeneous space. 

Example. [Q as a homogeneous space]: For an abstract group Q, define a action of Q on itself, 
where for any g £ Q, we have the image g(cr) = ga for any a € Q, i.e., Q acts on itself by left 
multiplication. This is a transitive action, so Q (as a set) is a homogeneous space. 



15 



The last example admits discussion of the vector space R[G]', we consider Q as the set X. Let z 
denote an element in R[Q], where z g denotes an indexed element of z for g G Q. For any z G R[G], 



the multi-correlation A z for some ui > 1, is given as 

■A^Xgi,--- ,g u -i) = ^z a . 

where for j, 1 < j < uj we have <?j G (/. The cases = 2 and w 



(SM-II.4) 



3 specialize respectively to the 

auto- and triple-correlations. For any u > 1, the function .Az"^ is a (/-invariant, i.e., for any a G £/, 
to verify this, simply evaluate (SM-II.4) with z a and put (z a ) CT = z a -i a for 



(w) 



A 



(«). 



we have A 
any a G £/. 

While the (correlation) functions (SM-II.4) seem to be only defined for the space R[Q), we can 
accommodate any (/-space X, by extending elements in R[X] to R[G]. Let x\ denote an element in 
X that has been (arbitrarily) chosen and fixed. Using this x\ then for any a G the extension 

of a, denoted a, satisfies 

a g = a g ( Xl ), for all g G Q. (SM-II.5) 

The stabilizer of the fixed element x\, denoted S Xl , is the set of group elements in Q that leave 
x\ un-moved, i.e., S Xl = {g G Q : g{x\) = x\\. Clearly S X1 will be a subgroup of Q. Since we do 
not discuss other stabilizer subgroups in the sequel, we will drop the subscript x\ from S Xl and 



simply write S throughout. The relationship (SM-II.5) relates S to extensions of vectors in R[X], 
whereby note that any extension a is constant over left-cosets of S in Q, i.e., for any g G Q, we have 
g for any s G S. Hence when considering homogeneous spaces X we only need to evaluate 

4) (for where a G R[X]) at points {(t^, • • • , ti ui _ 1 ) : 1 < i\, ■ ■ ■ , i^-\ < n}, where each 
tj is a left-coset representative. There are at most n^" 1 such points, where n = j^X. For the 
previously fixed x\, enumerate the rest of the elements in X as X2,x%, - ■ ■ ,x n , and fix tj to send 
x\ to Xj (possible only when Q acts transitively on X). Note n = j^X = j^Qjj^S. To conclude, 
extensions allow us to synonymously discuss correlations for R[G], and R[X] for any homogeneous 
(/-space X. 



We proceed to show how the multi-correlation (SM-II.4) for some to > 1, is related to the w-th 



component of the invariant ring. We do this by specifying the connection with (/-invariant in 



(2.4), which was already established to "generate" the cj-th degree polynomials in the ring. For 
any a G R[X], we calculate the multi-correlation A^ 

M ft- ■■■ t- ) 

aeg 

n 

(a) 



Y a <? a °t n 



as follows 



(b) 



EE 

3 = 1 s£S 

n 

EE 

j=i ses 



a t 



a *i 1 



•(t jS ){x n ) ■■■ a (tjs)(x iuj _ 1 ) 



(SM-II.6) 



3=1 



ses 



where in (a) we apply a = tjs for some tj, in (b) we apply (SM-II.5) and at jS = a(t jS )(a;i) = a tj(xi) = 
a Xj , and the last equality follows by definition tj{x\) = Xj. We notice the following from the final 
expression (SM-II.6). For each j, 1 < j < n, the second summation really runs over indexes over 
X x(u-i) in the set {^-(xC 1 ^- 1 )) : xC 1 ^- 1 ) g ^ s (X x ^-^)}, wher e ^(Af x ^- 1 )) is the 5-orbit (over 
x x(uj-i)^ fa&t contains (xj 1; • • • , Xi ui _ 1 ). The LHS and RHS of (SM-II.6) are really determined by 
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the indices i\, 



_i, for at most n w such choices. 



We notice the following connection between the final expression in (SM-II.6) and the (/-invariant 



as applied in Algorithm 2.1 First, there is a one-to-one correspondence between (/-orbits on 
X XuJ , and 5-orbits on X x ^~ l \ This correspondence is obtained for Qg(X XuJ ), by identifying 
n s (X x ^-^) with the subset {x^^ 1 ) : (x^" 1 ), x{) G ft} of Af*^" 1 ). Secondly for any ^-orbit 
£lg(X Xu) ) on X Xu) , by the corresponding w-array [6 x (i^)]] in (2.1), we can express (see (2.4)) 

n ( 



n 



a xW • • • a x (u,-i) 

y(xW,- ,x<-"- 1 ))en' j 

where for each j, 1 < j < n, we have Q!j = {x^ 1:w_1 ) : (x^ 1:w_1 ^, xj) G SI}. Note that f^- is simply an 
orbit of the subgroup tjStJ 1 that stabilizes Xj, whereby = £ls(X x ^ -1 )), the 5-orbit previously 
identified with the (/-orbit $7. Recall from the proof of Proposition 2.1 that the (tj5i~ 1 )-orbit 



is simply the set {^(x^" 1 )) : x^" -1 ) G X x ^~^}. Finally, compare with ( |SM-II.6[ ) by taking 
,Xi U] _ 1 ) G QstX*^^ 1 ^) (determined by the indices i\, ■■ ■ , i w _i), and conclude the following 

result. 



Proposition SM-II.l. Let a G R[X). Let n S:1 (X x ^^), ■ ■ ■ ,n s ^(X x ^-^) denote the k u 



number of S -orbits on X x ^ Then firstly for an extension a, the multi- correlation A^ has at 



most unique evaluations, found at the points (t^ . 
(x^ , • • • , Xi 0J _ 1 ) of the S -orbits. 

Secondly, the output ^(a^) of Algorithm 2.1 



corresponding to the representatives 



is equivalent to the multi- correlation A^ /< 



or 



the extension a, whereby evaluation at the point (t^,--- corresponding to (a;^,-- - ,Xi u _ 1 ), 

is equal to the value of /^^xu^a^), see (2.4) ; where the Q-orbit £lg(X XuJ ) corresponds to the 



i x iui-l ) • 



S-orbit that contains (x^, ■ 

The second part of Proposition |SM-II.l proves the intended equivalence between the Q- 



invariants in 2.4 and the multi-correlations. This proposition establishes a connection between 



Kakarala's representation theoretic analysis, discussed in the sequel, and the invariant theory 



discussed in Supplementary Material SM-I 



SM-II.2 Kakarala's completeness results for triple-correlation: This subsection provides 
a brief introduction to representation theoretic techniques for showing completeness of the triple 
correlation. We discuss a constructive algorithm for finite cyclic groups (which more generally also 
applies to finite abelian groups), and Kakarala's completeness result for compact groups. Note that 
compact groups include finite groups under the discrete topology. Good references to this material 
include the textbook [25], and Kakarala's and Kondor's theses [i~7]|l8] . 

Here we let V denote a finite-dimensional vector space. A representation of a group Q over 
V, is an action of Q on the vector space V; for any g G Q, each z G V is sent to p(g)z, whereby any 
p(g) is an invertible linear map. For example suppose V = R[Q], and for g G Q set p(g) to be a 0-1 
matrix in R^ x ^ whose h, a-th element (p(g))h,a equals 1 i.f.f. h = go. This representation, called 
the left-regular representation, is in fact related to the previous example of Q acting on itself 
(i.e., Q is a homogeneous (/-space). 

A representation (p, V) is said to be irreducible, if the subspace of V invariant under 
the representation action, is trivial (i.e., the invariant subspace equals either {0} or V). An 
unitary representation (p, V) preserves the inner product on V, i.e., for all g G Q we have 
(p(g)z, p(g)z') = (z,z') for any z, z' G V. Two representations (pi,Vi) and (p2,V2) are said to 
be equivalent, if there exists a linear bijection A : V2 — > V\ such that p\(g)A = Ap2 for all g G Q. 
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The dual of a finite group G, denoted Q, is the complete set of irreducible pairwise non-equivalent 
(unitary) representations of Q. If Q is finite then so is Q . The machinery to obtain Q, from the left- 
regular representation, is given by the Peter-Weyl theorem (see [25], pp. 85-86, for the statement 
for finite Q). The following is the analogue of the Fourier transform, stated for finite Q. 

Definition SM-II.l. (c./., |25|, p. 99) Let z G R[Q]. Let Q be a finite group with finite dual 
Q. The (abstract) Fourier transform component of z with respect to a irreducible (unitary) 
representation (p, V), is the linear operator z(p) : V — > V defined by 

i{p) = Y J z a -p{9)- (SM-II.7) 

The techniques here will be very related to this Fourier transform. In what follows, we need to 
consider the product group Q x Q, and its dual Q x Q . Here, each (p, V) G Q x Q has maps p(g, h) 
indexed by an element pair g,h G £/. For the the triple correlation A z of any z G R[>7], we now 
elucidate an illuminating structure of a Fourier transform component, specialhj^j denoted B z (p). 
Consider two elements zi,Z2 G R[G x G] related to z G R[G], as follows. For zi, set (zi)( g9 ) = z g 
for all g G <5 and (zi)( g m = when h ^ g. For Z2, set (z2)( 9j h) = for all g,h £ G- Let f denote 
complex conjugation. Then for any (p, V) G <5 x Q, we see that 



, (7 



^2 ^z a z g z h - p{a~ l g,a 

^2 ^2 z <? z v9 z <?h • p(g, h) 

A z 3 \g,h).p(g,h) = B z (p), 

g,h£G 



^2 z 9 Zh ' P(d,h) 
g,h£G 

l h) 



(SM-II.8) 



where the second last equality follows from the definition (SM-II.4) of the triple correlation A. 



,(3) 



We proceed to further manipulate the LHS of ( SM-II.8| ). Each (p, V) in Q x Q can be expressed 
as (pi <g> p 2 , Vi 8) V 2 ), wher e (pi, Vi), (p 2 , V 2 ) G G, where p(^, h) = pi(g) (g) P2(/i), see [25], p. 272. 
Thus for z^(p) in (SM-II.8), p = pi <8> P2, we conclude 

z 2 (pi®p 2 ) = z(pi)(8)z(p 2 ), (SM-II.9) 

where the RHS are two Fourier transforms of z in R[X], corresponding to representations 



(pi, Vi), (p 2 , V 2 ) G Q. Next we require the notion of a direct sum representation (pi © p 2 , Vi © V 2 ) 



of two representations (pi,Vi) and (p 2 ,V2) of G, where Vi,V2 are orthogonal. In the direct sum 
for all g G Q., we mean that Qi(g) leaves V2 invariant, and 02(g) leaves Vi invariant. The tensor 
product representation pi © p2 can be expressed as direct sums of representations in G, i.e., 

Pi ® p 2 = g® m n>^ (SM-II.10) 

where = denotes equivalence in representations (under some linear operator A pi)P2 : V — > V 1 where 
V' is some subspace of R[Vf]), and the notation g® £ for g G G, i G Z, means the representation 
g <8) • • • <8> Q formed by I copies of g, and finally m PljP2 : G — > Z returns for each g in G, the number 



of copies in the tensor product. From (SM-II.10) we can conclude for zl(p) in (SM-II.8), where 



lu The B stands for bi-spectrum, a term for the (2-dimensional) Fourier transform of the triple correlation. 
llr rhe direct sum Vi © V2 of vector spaces equals {vi + V2 : vi G Vi,V2 G V2}). 
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P = Pl(S P2, 



zi(pi®P2) = 0(z(^))®^.«^ (SM-n.ll) 
where = means the same equivalence earlier in (SM-II.10). By the identity B z (p) = (zi(p))^ ^(p) 



developed in (|SM-II.8|), we conclude where p = pi <X> pi the following 



B z ( P i®p 2 )A 



©(*U?))« 
.e&Q 



t 



(SM-II.12) 



where A Pl @ P2 makes the equivalence ( SM-II.10 ). From ( SM-II.12 ), we can now describe an algorithm 

(3) 

that recovers the Fourier coefficients z(p) from that of the triple-correlation A z (i.e., from 
B z (pi ® Pz))- Then by a Fourier inversion theorem, 25 , p. 100, we contain obtain from z(p) 
the data z. 

A condition will be required for the algorithm to work: 

for all (p, V) G Q, z(p) is an invertible map. (SM-II.13) 



If (SM-II.13) holds, then for all p\,p 2 G G the following quantity 

B' z (pl ® P2) = B«(pi ® p 2 )vlp 1 ®p 2 z~ 1 (/Ji) ® z- 1 (p 2 )Al i<S)p2 



(SM-II.14) 



is well-defined, where Ac pi9 is the adjoint of A pi ® P2 with complex conjugation. Let (1, V) denote 
the trivial representation whereby 1(g) = 1 for all g G Q. We see that 

zi(l®p) = z(p), (SM-II.15) 

z 2 (l®p) = z(l)-z(p), 

which follows from ( |SM-H.11[ ) and ( |SM-II.9[ ). Then from QSM-H.12| ) the following algorithrrj^J 
under the existence of an appropriate labeling qi, q 2 , Q3, ■ ■ ■ of representations in Q (where q\ = 1), 
will perform the promised task. 

Algorithm SM-II.l. To obtain Fourier coefficients z(p) from B z (p\® p 2 ), where p,p\,p 2 € 
G 



As B z (l (g> 1) = (z(l)y holds from ( |SM-II.8[ ) and ( |SM-II.15[ ), compute z(l) = z( ei ). 

As B z (t <g> q 2 ) = z(l) • (z(g 2 )) ] z(q 2 ) holds from QSM-H.8D and ( |SM-II.15D , compute z(q 2 ). 

— Note that since z a (g 2 ) = p 2 (a)z(g 2 ) for any a G G, we can only determine z(g 2 ) up to 
(/-invariance (i.e., if £(£2) solves the above expression, then so does p 2 (a)z(g 2 ) for any 
a G G). 



For £3, 04, 
£ > 3, use 



, use the following iteration derived from both (SM-II.12) and (SM-II.14). For 



B' z (ge-i ® g 2 ) = z(g^ © M £ _i 

to solve for z(g£) where the LHS will be known using previous computations, where M|_i is 
the remainder term in the RHS of (SM-II.10) for gg_\ ® g 2 , after pulling out one copy of Q£. 



Now for the final step of Algorithm SM-II.l to work, the labeling gi, g 2 , g3, ■ ■ ■ must allow z(gi) 
to be pulled out in each ^-th step. Unfortunately in general for finite groups G, this labeling is 



12 This steps of this algorithm was not stated as clearly in previous work, hence it is valuable to record them here. 
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unknown. On the other hand if Q is cyclic, the representations (qi,V) G Q, 1 < £ < #G, possess 
a "cyclic group structure", see |25| , p. 274. In particular, there exists some choice for labeling 
Qii Q2, Q3, • • • , such that we can express for any 2 < £ < jf-Q 

Qi = Qi-i ® Q2 



using some special choice for Q2- Hence for finite cyclic groups, Algorithm SM-II.l will work as long 



condition (SM-II.13) is met. Also for finite abelian groups in general, which are always isomorphic 



to direct product of a finite number of finite cyclic groups, appropriate extensions can be perused. 



In conclusion, Algorithm SM-II.l is a constructive proof of a completeness result (under the above 
appropriate conditions), that Ai. = A„, if and only if z' must be some obtainable from z by some 



Using the condition (SM-II.13), Kakarala proved a remarkable completeness result of the same 
vein, for the large class of compact groups (which also includes some infinite groups - under 



appropriate generalization of the vector space R[Q], the Fourier transform in Definition SM-II.l 
and the dual Q, see (l7| for details). 

Theorem SM-II.l. (c./., |17|) Let Q be a compact group, and let Q be its dual. Let z be any 



arbitrary function in R[Q], for which we assume that condition (SM-II.13) is met. Then the triple- 
correlation A z ^ of z, equals another A^r for some z' € R[£7], if and only if there z' = z 9 for some 
g in Q. 



Unfortunately Kakarala's proof is non-constructive, and we still do not know how to run 



Algorithm SM-II.l for general groups (but see [17] for an algorithm that works for the group 
of all 2 x 2 unitary matrices with determinant +1). The proof of Theorem SM-II.l relies on 
Tannaka-Krein duality (Proposition 1, 18], p. 199). 



Note the following important points. Note Theorem SM-II.l only requires condition (SM-II.13) 
(i.e., does not require the labeling Qi, Q2, Q3, ■ ■ ■), whereby one seems to be able to satisfy it by slight 



perturbation of a. This is mis-leading, as Kondor pointed out 18 , pp. 89-90, for extensions as in 
(SM-II.5), i.e., for z = a for a G R[X] of general homogeneous (/-spaces X, the condition (SM-II.13) 



turns out be mostly unsatisfied. While Kakarala has yet another remarkable completeness result 
for homogeneous spaces (see [17], Theorems 4.6 &: 4.7), however as Kondor also pointed out (p. 91), 
this result applies only for elements in R[G] that are constant under right cosets of S (or invariant 
under left 5-translation as in h7\), as opposed to our definition (SM-II.5) which makes extensions 



constant over left cosets of S. Hence Kakarala's result does not apply exactly to our setup. 



In conclusion, there exists some powerful results (e.g., Algorithm SM-II.l and Theorem SM- 



II.l) developed for the triple-correlation. However for general groups, there is room to improve 



these results, especially worthwhile would be a completeness result for homogeneous spaces for 



extensions as defined in (SM-II.5) 
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